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Apparatus and method for summarizing video information and processing program for 
summarizing video information 



(57) A summary reproducing apparatus (100), 
which is capable of reproducing a summary accurately 
for each type of video information and of reducing a bur- 
den in generating digest information, comprising a 
sound feature amount extraction unit (1 02) for obtaining 
a sound feature amount on the basis of a preset param- 
eter from entered audio/video information, a genre in- 
formation obtaining unit (103) for obtaining genre infor- 
mation from additional information added to the entered 
audio/video information, a decision parameter setting 



unit (1 06) for setting an optimum parameter for extract- 
ing a sound feature amount on the basis of genre infor- 
mation, and a control unit for deciding digest segments 
to be extracted in stored audio/video information on the 
basis of a sound feature amount suitable for the preset 
parameter and for controlling a reproduction unit (107) 
on the basis of the digest segments, wherein a summary 
is reproduced by using a parameter optimized for each 
genre. 
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where contents are switched and a part that follows the scene change part shows the beginning of the next contents 
especially, often shows an outline of the contents concerned, which indicates a feature part of the content information ' 
[0013] In addition, in video information added to the content information concerned such as a sport-watching program 
an exciting part of the contents often causes frequent scene changes and therefore intervals of the scene changes 
indicate feature parts of the content information. 

[001 4] In this manner, characteristics of the video information contained in the content information concerned depend 
upon a type of the content information. 

[001 5] Accordingly, in the present invention, thresholds in classifying a plurality of content sections used as reference 
to the decision of the partial video information can be optimized on the basis of the identification information in the 
content information, thereby enabling an accurate extraction of the partial video information to be extracted even for 
a different type of content information so as to obtain digest information based on the contents of the video information 
[001 6] Further, the partial video information to be extracted can be extracted accurately oniy by optimizing the thresh- 
olds, thereby enabling an easy decision of the partial video information without a need. for changing a processing 
operation of the partial video information for each type of content information. 

[0017] The above object of the present invention can be achieved by a video information summarizing apparatus of 
the present invention for extracting one or more pieces of partial video information as some parts of video information 
from the video information to which audio information is added on the basis of the audio information and for generating 
digest information having a shorter time length of the video information on the basis of the partial video information 
extracted. The apparatus is provided with: an obtaining device which obtains identification information for identifying 
a type of video information externally; a decision device which classifies the audio information added to the video 
information into a plurality of audio sections by using optimized thresholds and which decides the partial video infor- 
mation to be extracted on the basis of the classified audio sections; an optimization device which sets optimum values 
to one or a plurality of thresholds used for classifying the audio information into the plurality of audio sections on the 
basis of the obtained identification information; and a generation device which generates the digest information by 
extracting the decided partial video information from the video information. 

[0018] According to the present invention, an obtaining device obtains identification information for identifying a type 
of video information, an optimization device sets optimum values to one or more thresholds in the audio information 
on the basis of the identification information, a decision device classifies the video information into a plurality of audio 
sections by the optimized thresholds and decides partial video information to be extracted on the basis of the classified 
audio sections concerned, and a generation device generates digest information on the basis of the decided partial 
video information. 

[0019] In general, a feature of the audio information added to the video information depends upon a genre of a TV 
or other program. 

[0020] For example, a news program has a silent part between news pieces. In other words, in the news program 
the silent part indicates a part where a scene is changed over or a part where contents are switched and a part that 
follows the silent part shows the beginning of the next contents, especially, often shows an outline of the contents 
concerned, which indicates a feature part of the video information. 

[0021] In addition, in video information having cheer sounds in background noise forming the audio information added 
to the video information concerned such as a sport-watching program, the cheer sounds in the audio information will 
be extremely high in audio level in the exciting part of the contents and therefore the audio level of the cheer sounds 
indicates a feature part of the video information. Further, a sport program has no or very little silent part while always 
having cheer sounds in background noise, and therefore there is a need for setting a higher value to a threshold of a 
audio section indicating an exciting part of the contents than that to other video information. 

[0022] In this manner, sound characteristics of the audio information added to the video information concerned de- 
45 pend upon a type of the video information. 

[0023] Accordingly, in the present invention, thresholds in classifying a plurality of audio sections used as reference 
to the decision of partial video information can be optimized on the basis of the identification information in the video 
information, thereby enabling an accurate extraction of the partial video information to be extracted even for a different 
type of video information so as to obtain digest information based on the contents of the video information 
[0024] Further, the partial video information to be extracted can be extracted accurately only by optimizingthe thresh- 
olds, thereby enabling an easy decision of the partial video information without a need for changing a processing 
operation of the partial video information for each type of video information . 

[0025] In one aspect of the present invention, the decision device decides the partial video information to be extracted 
on the basis of at least a time-base position of at least any one of the plural types of classified audio sections 
[0026] According to this aspect, preferably the decision device decides the partial video information to be extracted 
on the basis of at least a time-base position of at least any one of the plural types of audio sections classified in the 
video information. 

[0027] In general, since the audio information added to the video information shows feature parts such as exciting 
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set to each partial video information on the basis of the obtained identification information, a setting device sets the 
optimized importance to the partial video information, and a generation device generates digest information on the 
basis of the decided partial video information and importance. 

[0040] In general., a feature of the video information added to the content information depends upon a genre of a TV 
5 or other program. 

[0041] For example, a news program has a scene change part between news pieces. In other words, in the news 
program, the scene change part indicates a part where a scene is changed over or a part where contents are switched 
and a part that follows the scene change part shows the beginning of the next contents, especially, often shows an 
outline of the contents concerned, which indicates a feature part of the content information. Therefore, this part is very 
10 important in comparison with sections other than the scene change sections. 

[0042] Jn addition, in video information added to the content information concerned such as a sport- watching program, 
an exciting part of the contents often causes frequent scene changes and therefore intervals of the scene changes 
indicate feature parts of the content information. 

[0043] In this manner, characteristics of the video information contained in the content information concerned depend 
75 upon a type of the content information. 

[0044] Accordingly, in the present invention, the importance in generating the digest information on the basis of the 
partial video information can be optimized on the basis of the identification information in the content information, 
thereby enabling an accurate extraction of the partial video information to be extracted even for a different type of 
content information so as to obtain digest information based on the contents of the content information. 
[0045] The above object of the present invention can be achieved by a video information summarizing apparatus of 
the present invention for extracting one or more pieces of partial video information as some parts of video information 
from the video information to which audio information is added on the basis of the audio information and for generating 
digest information having a shorter time length of the video information on the basis of the partial video information 
extracted and importance. The apparatus is provided with: an obtaining device which obtains identification information 
for identifying a type of the video information; a decision device which classifies the video information into a plurality 
of audio sections on the basis of thresholds in the audio information and which decides the partial video information 
to be extracted on the basis of the classified sections; an optimization device which optimizes the importance set to 
each of the partial video information on the basis of the obtained identification information; a setting device which sets 
the optimized importance to each of the partial video information; and a generation device which generates the digest 
information by extracting the decided partial video information from the video information on the basis of the importance. 
[0046] According to the present invention, an obtaining device obtains identification information for identifying a type 
of video information, a decision device classifies the video information into a plurality of audio sections on the basis of 
thresholds to decide partial video information to be extracted, an optimization device optimizes the importance set to 
the partial video information on the basis of the obtained identification information, a setting device sets the optimized 
importance to the partial video information, and a generation device generates digest information on the basis of the 
decided partial video information and importance. 

[0047] In general, a feature of the audio information added to the video information depends upon a genre of a TV 
or other program. 

[0048] For example, a news program has a silent part between news pieces. In other words, in the news program, 
the silent part indicates a part where a scene is changed over or a part where contents are switched and a part that 
follows the silent part shows the beginning of the next contents, especially, often shows an outline of the contents 
concerned, which indicates a feature part of the video information. Therefore, this part is very important in comparison 
with other audio sections such as noise sections. 

[0049] In addition, in video information having cheer sounds in background noise forming the audio inform 
to the video information concerned such as a sport- watching program, the cheer sounds in the audio information will 
be extremely high in sound level in exciting parts of the program contents, by which the sound level of the cheer sounds 
indicates a feature part of the video information. Further, a sport-watching program has no or very little silent part while 
always having cheer sounds in background noise, and therefore there is a need for setting a higher value to a threshold 
of an audio section indicating an exciting part of the contents than that to other video information and a need for 
changing settings of the importance in summary reproduction according to a section used as reference when the 
summary is reproduced by extracting the exciting contents accurately. 

[0050] In this manner, sound characteristics of the audio information added to the video information concerned de- 
pend upon a type of the video information. 

[0051] Accordingly, in the present invention, the importance in generating the digest information based on the partial 
video information can be optimized on the basis of the identification information in the video information, thereby en- 
abling an accurate extraction of the partial video information to be extracted even for a different type of video information 
so as to obtain digest information based on the contents of the video information. 

[0052] In one aspect of the present invention, if the decision device decides the partial video information to be ex- 
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is provided with: an obtaining process for obtaining identification information for identifying a type of the content infor- 
mation ; a decision process for classifying the content information into a plurality of content sections by using optimized 
thresholds and for deciding the partial video information to be extracted on the basis of the classified content sections; 
an optimization process for setting optimum values to the one or more thresholds used for classifying the content 
5 information into the plurality of content sections on the basis of the obtained identification information; and a generation 
process for extracting the decided partial video information from the video information to generate the digest informa- 
tion. 

[0065] According to the present invention an obtaining process is to obtain identification information for identifying 
a type of content information, an optimization process is to set optimum values to one or more thresholds in the video 
10 information on the basis of the identification information a decision process is to classify the content information into 
a plurality of content sections by the optimized thresholds and to decide the partial video information to be extracted 
on the basis of the classified contentsections concerned, and a generation process is to generate the digest information 
on the basis of the decided partial video information. 

[0066] In general, a feature of the video information contained in the content information depends upon a genre of 
15 a TV or other program. 

[0067] For example, a news program has a so-called scene change pan between news pieces. In other words, in 
the news program the scene change part indicates a part where a scene is changed over or a part 
where contents are switched and a part that follows the scene change part shows the beginning of the next contents, 
especially, often shows an outline of the contents concerned, which indicates a feature part of the content information' 
20 [0068] In addition, in video information added to the content information concerned such as a sport-watch ing program, 
an exciting part of the contents often causes frequent scene changes and therefore intervals of the scene changes 
indicate feature parts of the content information. 

[0069] in this manner, characteristics of the video information contained in the content information concerned depend 
upon a type of the content information. 

[0070] Accordingly, in the present invention, thresholds in classifying a plurality of content sections used as reference 
to the decision of partial video information can be optimized on the basis of the identification information in the content 
information, thereby enabling an accurate extraction of the partial video information to be extracted even for a different 
type of content information so as to obtain digest information based on the contents of the content information. 
[0071] Further, the partial video information to be extracted can be extracted accurately only by optimizing the thresh- 
olds, thereby enabling an easy decision of the partial video information without a need for changing a processing 
operation of the partial video information for each type of content information. 

[0072] The above object of the present invention can be achieved by a video information summarizing method of 
the present invention for extracting one or more pieces of partial video information as some parts of video information 
from the video information to which audio information is added on the basis of the audio information and for generating 
digest information having a shorter time length of the video information on the basis of the partial video information 
extracted. The method is provided with: an obtaining process for obtaining identification information for identifying a 
type of the video information; a decision process for classifying the audio information added to the video information 
into a plurality of audio sections by using optimized thresholds and for deciding the partial video information to be 
extracted on the basis of the classified audio sections: an optimization process for setting optimum values to the one 
or more thresholds used for classifying the audio information into the plurality of audio sections on the basis of the 
obtained identification information; and a generation process for extracting the decided partial video information from 
the video information to generate the digest information. 

[0073] According to the present invention, an obtaining process is to obtain identification information for identifying 
a type of video information, an optimization process is to set optimum values to one or more thresholds in the audio 
information on the basis of the identification information, a decision process is to classify the video information into a 
plurality of audio sections by the optimized thresholds and to decide partial video information to be extracted on the 
basis of the classified audio sections concerned, and.a generation process is to digest information on the basis of the 
decided partial video information. 

[0074] In general, a feature of the audio information added to the video information depends upon a genre of a TV 
50 or other program. 

[0075] For example, a news program has a silent part between news pieces. In other words, in the news program 
the silent part indicates a part where a scene is changed over or a part where contents are switched and a part that 
follows the silent part shows the beginning of the next contents, especially, often shows an outline of the contenis 
concerned, which indicates a feature part of the video information. 
55 [0076] In addition, in video information having cheer sounds in background noise forming the audio information added 
to the video information concerned such as a sport-watching program, the cheer sounds in the audio information will 
be extremely high in sound level in the exciting part of the contents and therefore the sound level of the cheer sounds 
indicates a feature part of the video information. Further, a sport program has no or very little silent part while always 
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[0092] Therefore, in the present invention, if the identification information of the video information shows an identi- 
fication of video information having cheer sounds in background noise such as a sport-watching program, cheer sound 
sections can be detected accurately by using optimized thresholds for the video information, thereby enabling a user 
to obtain digest information based on the contents of the video information. 

[0093] The above object of the present invention can be achieved by a video information summarizing method of 
the present invention for extracting one or more pieces of partial video information as some parts of video information 
from content information made of audio information and the video information and for generating digest information 
having a shorter time length of the video information on the basis of the partial video information extracted and impor- 
tance. The method is provided with: an obtaining process for obtaining identification information for identifying a type 
of the content information; a decision process for classifying the content information into a plurality of content sections 
on the basis of thresholds in the content information and for deciding the partial video information to be extracted- an 
optimization process for optimizing the importance set to each of the partial video information on the basis of the 
obtained identification information; a setting process for setting the optimized importance to each of the partial video 
information; and a generation process for extracting the decided partial video information from the video information 
on the basis of the importance to generate the digest information. 

[0094] According to the present invention, an obtaining process is to obtain identification information for identifying 
a type of content information, a decision process is to classify the content information into a plurality of content sections 
on the basis of thresholds and to decide partial video information to be extracted, an optimization process is to optimize 
the importance set to each partial video information on the basis of the obtained identification information a setting 
process is to set the optimized importance to the partial video information, and a generation process is to generate 
digest information on the basis of the decided partial video information and importance. 

[0095] In general, a feature of the video information added to the content information depends upon a genre of a TV 
or other program. 

[0096] For example, a news program has a scene change part between news pieces. In other words, in the news 
program, the scene change part indicates a part where a scene is changed over or a part where contents are switched 
and a part that follows the scene change part shows the beginning of the next contents, especially, often shows an 
outline of the contents concerned, which indicates a feature part of the content information. Therefore, this part is very 
important in comparison with sections other than the scene change sections. 

[0097] I n add ition , in video information added to the content information concerned such as a SDort-watch ing program 
an exciting part of the contents often causes frequent scene changes and therefore intervals of the scene changes 
indicate feature parts of the content information. 

[0098] In this manner characteristics of the video information contained in the content information concerned depend 
upon a type of the content information. 

[0099] Accordingly, in the present invention, the importance in generating the digest information on the basis of the 
partial video information can be optimized on the basis of the identification information in the content information 
thereby enabling an accurate extraction of the partial video information to be extracted even for a different type of 
content information so as to obtain digest information based on the contents of the content information. 
[0100] The above object of the present invention can be achieved by a video information summarizing method of 
the present invention for extracting one or more pieces of partial video information as some parts of video information 
from the video information to which audio information is added on the basis of the audio information and for generating 
digest information having a shorter time length of the video information on the basis of the partial video information 
extracted. The method is provide with: an obtaining process for obtaining identification information for identifying a 
type of the video information; a decision process for classifying the video information into a plurality of audio sections 
on the basis of thresholds in the audio information and for deciding the partial video information to be extracted on the 
basis of the classified sections; an optimization process for optimizing the importance set to each of the partial video 
information on the basis of the obtained identification information; a setting process forsetting the optimized importance 
to each of the partial video information; and a generation process for extracting the decided partial video information 
from the video information on the basis of the importance to generate the digest information. 

[0101] According to the present invention, an obtaining process is to obtain identification information for identifying 
a type of video information, a decision process is to classify the video information into a plurality of audio sections on 
the basis of thresholds and to decide partial video information to be extracted, an optimization process is to optimize 
the importance set to each partial video information on the basis of the obtained identification information a setting 
process is to set the optimized importance to the partial video information, and a generation process is to generate 
digest information on the basis of the decided partial video information and importance. 

[0102] In general, a feature of the audio information added to the video information depends upon a genre of a TV 
or other program. 

[0103] For example, a news program has a silent part between news pieces, in other words, in the news program 
the silent part indicates a part where a scene is changed over or a part where contents are switched and a part that 
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identification information o a ine ^loiT^Zt^T^ ° P the baSiS 
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to different sound levels * ^ the ,mportance ,n sum ^ reproduction varies according 
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sections. P ^ ^ ,mportance set to the partial video information decided Pased on the silent 

L 0 ."p 3 L^^^ depend «P- a ^Pe of the video information concerned and it 
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mization process is to optimize the importance set to the partial video information decided based on the cheer sound 
sections. 

KnJL ACC ° rdin9 t0 this aSp f ■ P re,er ^ly. if the obtained identification information shows an identification of video 
information having cneer sounds in background noise forming the audio information, the decision process is to obtain 
cheer sound sections having cheer sounds when the audio information is classified into the plurality of the audio sections 
and to decde the partial video information on the basis of the cheer sound section concerned and the oration 
process ,s to optimize the importance set to the partial video information decided based on the cheer sound section 
™L f °t U t characteristics the video information depend upon a type of the video information concerned and it 
is important to reproduce the sections having loud cheer sounds accurately in summary reproduction in the video 
"> information hav,ng cheer sounds in background noise forming the audio information 

A f CC ° rdin9 ! y ' in l t he p : esent invention > if the identification information of the video information shows an iden- 
tification of video information having cheer sounds in background noise forming the audio information, it is possible to 
opt,m,ze the importance of partial video information decided based on sections having cheer sounds, thereby enabling 

« TJZZ a ^: T f P3rtial Vid6 ° inf ° rmati0n t0 be eXUaCted 50 aS 10 obtain digest information based o! 
75 contents of the video information. 

[0119] In the Drawings; 

indention * ^ dia9ram Sh ° Win9 ^ StmClUre ° f * repr0ducina apparatus according to the present 
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embodiment 9 ^ °' aSSiSt8nCe explainin 9 a P rinci P |e of detecting a silent section and a noise section in an 
FIG. 3 is a graph of assistance in explaining a principle of detecting a plurality of noise sections in the embodiment 
a nohe" LET aSS ' S,ance in e *P |ai ™9 a principle of deciding the start and stop time of a segment based on 

FIG. 5 is a graph of assistance in explaining a principle of deciding the start and stop time of a segment based on 
3 silent section, 

FIG. 6 is a flowchart showing a summary reproducing operation in the embodiment' and 

FIG. 7 is a block diagram showing a structure of a conventional summary reproducing apparatus. 

30 (I) Embodiment 

SZ^'^bT'^ of the present invention wi " now be described hereinafter with reference * the 

[01 21 ] The embodiment is carried out by applying the present invention to a summary reproducing apparatus for 

^^r^^r o,v,deo info ™ tion such as a te,evision b ~ ™™ 

tolhfembSmenVS ^ZT^ ^ ^ * ^ r6Pr ° dUCln9 aPPar * US 

*o embodiment " * * ^ Sh ° Wm9 ° f ^ SUmma,y re P roducin 9 W>™*» according to the 

S tra^mTttT^r' 0 ^" 9 " °° ° f the embodiment sh ™ n in ^ 1 takes in digital audio/video info, 

mat.cn transmitted from a communications line or received at a receive unit, not shown. Then the summary reproducino 

fo'rma^cl , ° bta ; f S C ' aS f Cati ° n ' nf0rma,l0n (hereinafter ' r6ferred t0 as 9-re information) of the LudSeo n 
formation concerned from the inputted digital audio/video information 

45 51 ^ Urth ! • the SUmmarV re P roducin 9 apparatus 1 00 extracts a feature amount of audio/video information (here- 

-natter, referred to as an audio feature amount) inputted on the basis of a plurality of preset thresholds (hereinafter 

'The bas7so a r e,erS) & T SS,S 3 threSh °' d US6d 35 re ' erenCe <"-'-fter P referred to as a So pa- 
rameter) on the basis of the genre information obtained from the inputted audio/video information. Then the summary 
reproducing apparatus 1 00 selects the audio feature amount extracted by the parameter suitable for the set deepen 
so parameter concerned and decides (hereinafter, referred to as a decision process of digest segments '£Ji2SS 

l 0 ! 2 /! £?t* dSSCribed ' the summar V producing apparatus 100 decides digest segments to be extracted and 
ss rS^ucSon ^ aUdi0/Vide ° in, ° rmati ° n ^ °' thS d6dded <^ St Se 9 ments 

^IZTrT 0 ^ 0 d l Cide di9eSt Se9mentS t0 be eXtraC,ed iS Carried out as follows: P °tential digest segments 
(hereinafter, referred to as digest segment candidates) are listed, first, and then digest segments to be extracted are 
narrowed down from the listed digest segment candidates to decide the digest segments 
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[0141] The operation unit 1 05 allows a user to instruct storage control of the audio/video information instruct repro- 
duction of the stored audio/video information, and enter a summary reproducing time at the time of summary repro- 
duction. With an input of these instructions to the control unit 108. the control unit 108 controls each unit accordinq to 
these instructions, 

[01 42] The decision parameter setting unit 1 06 receives an input of the genre information outputted from the storage 
unit 1 04. The decision parameter setting unit 1 06 sets a decision parameter for use in a decision process of digest 
segments to be extracted by the control unit 1 08, specifically, an optimum audio level on the basis of the inputted genre 
information and outputs a value (audio level) of the set decision parameter to the control unit 1 08 
[0143] The decision parameter setting unit 106 optimizes the importance in deciding the digest segments to be 
extracted on the basis of the inputted genre information and outputs the optimized value or a parameter for the opti- 
mization to the control unit 1 08. The optimization of the importance will be described later. 

[0144] The reproduction unit 107 receives an input of the digital audio/video information outputted form the storage 
unit 104. The reproduction unit 107 demultiplexes and decodes the inputted multiplex audio/video information into the 
video information and the audio information and then reproduces a summary in accordance with the instructions from 
the control unit 108. in addition, the reproduction unit 107 outputs the reproduced audio signals and video siqnals to 
the display unit 109. 

[0145] The control unit 108 controls the storage into the storage unit 104 in accordance with instructions inputted 
from the operation unit 1 05 and decides digest segments described later on the basis of the audio feature amount and 
the parameter set by the audio feature amount extraction unit 102 and the decision parameter setting unit 106 Then 
the control unit 108 performs control of the reproduction operation of the reproduction unit 107 on the basis of the 
decided digest segments. 

[0146] The display unit 1 09 receives an input of the audio signals and the video signals outputted from the repro- 
duction unit 1 07. The display unit 1 09 displays the inputted video signals on a monitor screen or the like while amplifying 
the audio signals by means of a speaker or the like. 

[01 47] Referring next to FIGS. 2 and 3, the following describes the audio feature amount extraction process accordino 
to this embodiment. 

[0148] It should be noted that FIGS. 2 and 3 are graphs of assistance in explaining a principle of detecting a silent 
section and a noise section in the embodiment. 

[01 49] In general, the audio information added to the audio/video information plays an important role in summarizing 
the audio/video information in shorter time than the time length of the audio/video information recorded or provided 
over a communications line or the like. 

[0150] For example, in a television broadcasting program, a noise section indicates an exciting part of the program 
while a silent section indicates a part where a scene is changed over or where program contents are switched. 
[0151] Specifically, if the program is a news program, since a silent section or so-called "interval (pause)" is taken 
at the time of switching news contents and the part that follows the "pause" shows the next contents the part will be 
a feature part of the video information. Especially, the part that follows the silent section shows the beginning of the 
next contents, and often shows an outline of the contents concerned. 

[0152] On the other hand, if the program is a sport-watching program, since responses from spectators show in 
background noise such as shouts and cheers, in an exciting scene (a home run scene in a baseball game or a goal 
scene in a soccer game) an announcer and spectators voices will be louder naturally, so that the audio level will be 
much higher than the other scenes. Then, the part including the exciting scene can be regarded as a feature part of 
the video information. 

[0153] Therefore, a detection of silent sections becomes important in the audio/video information having silent sec- 
tions such as a news program. On the other hand, in the audio/video information having cheer sounds in background 
noise such as a sport-watching program, almost no silent section will be detected and more appropriate summary 
reproduction is achieved by detecting noise sections having different thresholds. 

[0154] As mentioned above, an audio section such as a silent section or a noise section to be extracted and an 
optimum threshold in the audio section are various according to a genre. 

[0155] As mentioned above, in this embodiment, the audio feature amount extraction unit 1 02 previously calculates 
an average sound pressure level (power) per unit time in the extracted audio information, and extracts a plurality of 
audio sections such as silent sections or noise sections according to a plurality of thresholds, for example, audio levels 
Then, the decision parameter setting unit 106 sets a decision parameter for the extraction on the basis of the inputted 
genre information, and the control unit 108 selects an optimum audio section for use in deciding digest segments from 
the extracted audio sections according to the parameter suitable for the set decision parameter. 
[0156] Specifically, if the audio/video information is a sport-watching program, the decision parameter setting unit 
1 06 selects noise sections having higher thresholds (audio levels) than those in a news program, while making settings 
for inhibiting a process for silent sections in a sport-watching program since it has always cheer sounds in all scenes 
and almost no silent section is detected in contrast. If it is a news program, the decision parameter setting unit 106 
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threshold is set h,gher than the importance of the digest segment decided according to the noise section 2 by using a 
weighing function used for setting the importance of the digest segment decided on the basis of the silent section 
[0169] The following describes the decision process of the digest segments to be extracted in this embodiment by 
using FIGS. 4 and 5. ' 

s [0170] Referring to FIG. 4, there is shown a graph of assistance in explaining a principle of deciding the start and 
stop time of a segment based on the noise section. Referring to FIG . 5. there is shown a graph of assistance in exolaininq 
a principle of deciding the start and stop time of a segment based on the silent section. 

[0171] As mentioned above, in a news program, since a silent section or so-called "interval (pause)" is taken at the 
time of switching news contents and the part that follows the "pause" shows the next contents and the part is to be a 
10 feature part of the video information, the part that follows the silent section becomes important 

[0172] If the program is a sport-watching program, since responses from spectators show in background noise such 
as shouts and cheers, an exciting scene will be much higher in audio level than the other scenes, and the part including 
the exciting scene can be regarded as a feature part of the video information. 

[0173] In this manner, since a relative position on the time axis between a silent or noise section and a feature part 
of the audioA/.deo information and their importance are various, a process to decide the digest segments to be extracted 
based on the silent and noise sections are to be performed in another process. The following describes the digest 
segment decision process according to the embodiment. 

[0174] In the digest segment decision process of the embodiment, the start time (STSSj), stop time (SESS ) and 
importance (IPSS,) of each digest segment are decided on the basis of a silent section and noise section It should be 
noted that, however, T indicates that the section is the i-th silent or noise section, and V indicates the i-th digest 
segment. J a 

[0175] In the digest segment decision process of the embodiment, the start time and importance of each digest 
segment are decided on the basis of a silent or noise section to list digest segment candidates. The digest segment 
candidates are then narrowed down to decide the minimum digest-segment time length, the typical digest-segment 
time length, and the maximum digest-segment time length so as to decide the stop time of each of the narrowed-down 
digest segments. 

[0176] Further, in the digest segment decision process of the embodiment, the section length information (DRSS-) 
of the silent section and the noise section, which has been used as a base for deciding a digest segment is held In 
the embodiment, afterthe digest segments are decided once and narrowed down, to decide the stop time it is necessary 
to determine whether the section length information (DRSSj) indicates the digest segment decided on the basis of the 

^ e o n L S f Ct ' 0n ° r ! he n ° iSe S6Cti0n m decidin9 the st °P time Ascribed later, and then the section length information 
(DRSSj) is used for the determination. 

[01 77] Specifically, in the embodiment, the section length of the noise section used as reference is set for the diaest 
segment set based on the noise section (DRDN, = DRSS,). On the other hand, DRSS, = 0 is set for the digest segment 
35 based on the silent section. y 

[0178] Therefore, in the digest segment decision process, when the stop time is decided in a manner described later, 
it can be determined that the digest segment is set based on the silent section if DRSS, = 0, or the noise section if 
DRSSj ^ 0. 
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Setting of Digest Segment in Noise Sectionl 



[0179] As mentioned above, since the noise section shows an exciting part of the program, the noise section itself 
becomes .mportant. In the embodiment, as shown in FIG. 4, the start position of the noise section detected by the 
detection unit 1 03 is set as the start position of the digest segment. 
« [0180] In a sport-watching program, if shouts and cheers from spectators are collected and the collected sound is 
contained as background noise in the audio information added to the audio/video information, it will be more effective 
in summary reproduction that the reproduction starts from a part a bit previous to the exciting scene In general an 
exciting part such as a good play and a goal or scoring scene in a sport game has some time delay until the spectators 
cheer over the exciting scene, that is, until the noise section appears. For this reason, the start time of the digest 
segment based on the noise section in the audio/video information such as on the sport-watching program mav be 
moved forward At from the actual start time of the noise section. 

[0181] On the other hand, the stop time of the digest segment in the noise section is decided on the basis of the end 
position of the noise section . 

[0182] In view of the contents of the digest segment to be extracted, the end position of the noise section basically 
needs to be set at the stop time of the digest segment. However, if the time length of the digest segment to be extracted 
is too short, the scene concerned may be made difficult to understand. On the other hand, unnecessarily lonq time 
length could contain a lot of needless information, and an increase in information amount makes it impossible to sum- 
marize the video information unerringly. 
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[0184] For example, ae shown in FIG 4, when me noise section (ON. (e.g the noise section a in FiG 411 rin« „„> 

[0185] When the noise section (DN; (e.q the noise sertinn h in cir AWt*^^* 
segment time length (DR Min ), and equal to' or less than thrmaximu^ 

X 0 no1s?s h e^ 

[0186] Further, when the noise section (DN, (e.g., the noise section c in FIG 4)) exceeds the maximum di^t 



If 0 < DRSSj < DR Mjn , 



SES Sj = STSS + DR Mln . (£q 3) 

If DR Min <: DRSS , < DR Max , 



lf DR Max < DRSS j: 



SESS J = STSS + DRSSi. (£q 4) 



SESS^STSS + DR^. (£q g) 
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or more than the additional minimum silent-section length (DRSA Min ) will be set for the start position of the digest 
segment. 

[01 92] On the other hand, the stop time of the digest segment in the silent section is decided on the basis of the start 
position of the silent section that follows the silent section used for setting the start time of the digest segment. 
5 [0193] In this case, the section length of the silent section that follows the silent section used for setting the start 
time of the digest segment does not need to be equal to or more than the additional minimum silent-section length 
(DRSA Min ). Therefore, all the silent sections detected by the detection unit 1 03 are searched. 
[0194] Like in the noise section , the stop time of the digest segment is set in a manner described later using the 
minimum digest-segment time length (DR Mln ), the typical digest-segment time length (DRj VD ). and the maximum digest- 
's segment time length (DR Max ). 

[0195] For example, as shown in FIG. 5, when the start position of the silent section (DS, +1 (e.g.. the silent section 
a in FIG. 5)), which is detected immediately after the silent section set as the start time of the digest segment, does 
not reach the minimum digest-segment time length (DR Min ) ; the time length of the digest segment is the minimum 
digest-segment time length (DR Mln ). The minimum digest-segment time length (DR Mjn ) is added to the starttime of the 
digest segment, and the resultant time is set for the stop time of the digest segment. 

[0196] When the start position of the silent section (DS k1 (e.g., the silent section b in FIG. 5)). which is detected 
immediately after the silent section set as the start time of the digest segment, exceeds the minimum digest-segment 
time length (DR Mjn ) but does not reach the maximum digest-segment time length (DR Max ), the start position of the 
detected silent section (DS i+1 ) is set for the stop time of the digest segment. 

[0197] Further when the start position of the silent section (DS i+1 (e.g., the silent section c in FIG. 5}), which is 
detected immediately after the silent section set as the start time of the digest segment, exceeds the maximum digest- 
segment time length (DR Max ), the time length of the digest segment is the typical digest-segment time length (DRy ) 
The typical digest-segment time length (DR^) is added to the start time of the digest segment, and the resultant time 
is set for the stop time of the digest segment. 

[0198] In the embodiment, when the stop time of the digest segment is set using the minimum digest-segment time 
length (DR Mjn ). the typical digest-segment time length (DR Typ ), andthe maximum digest-segment time length (DR M ), 
the next silent section is detected in the following sequence. " 1 3> 

[0199] The silent section (DS M ) that follows the silent, section used as reference to the start time of the digest 
segment is detected in the following sequence of operations. First of all, it is detected whether the start position of the 
silent section (DS j+1 ) detected immediately after the silent section (DS,) is equal to or more than the minimum digest- 
segment time length (DR Min ) and equal to or less than the maximum digest-segment time length (DR Max ). If the start 
position does not exist within the range, it is then detected whether the start position of the silent section (DS i+1 ) 
detected immediately after the silent section (DS,) exists within the minimum digest-segment time length (DR Mjn ) + If 
the start position does not exist within the range, the silent section (DS j+1 ) detected immediately afterthe silent section 
(DS,) is determined to be in a range of the maximum digest-segment time length (DR Max ) or more. 
[0200] In other word S! the stop time of the j-th digest segment in the i-th silent section'is determined as follows: 
[0201] If the start position (ST) of the silent section (DS j+1 ) was found in the section [DR Min , DR Max ], 

40 SESSj = ST. (Eq 6 ) 

[0202] If the start position (ST) of the silent section (DS k1 ) was found in the section [0, DR Mir J. rather than the section 
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SESSj = STSSj + DR M[n . (Eq . 7) 

[0203] If the start position (ST) of the silent section (DS M ) was not found in the section [0, DR M J, 

SESSj = STSSj + DR Typ . (Ep> 8) 

[0204] In the sequence of detection of the silent section (DS j+1 ), even when the next silent section (DS i+1 ) exists in 
the minimum digest-segment time length (DR Min ), if the start position of anotner silent section (e.g., DS kn , where n > 
2) is equal to or more than the minimum digest-segment time length (DR Mln ), and equal to or less than the maximum 
digest-segment time length (DR Max ), the next silent section (DS j+1 ) that exists in the minimum digest-segment time 
length (DR Mln ) is not handled as the silent section that follows the silent section (DSj) used as reference to the start 
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accord wj.be inst^n. ...^SZ ^SfiS S^CS'" ^ " " 

IPSSj.fpRDS,) (Eq9) 
(MOT) ,n me elation 9, „., is a weight ,„ M o„, and in ,h. embodlmen,, (Eq. 1) or (Eq, 2) are * ,e d.soribad 
[Process to Narro w Down Digest Segment Candidates] 

- ^z,Z7^z:^zzzi^t: ' ,,e r ybe pe * om,ea °- au - >«»—*>««■ ™ 

down lor purposes oi roauepTln amn ™, f„ J H ° W ' Ve ' "* t " 9est to be ,., are parrowed 

- ssl; r^sx^r 0 " aown ,he di9 » *™ — « ~ -Sa- 
^-fa^ea^ enanrbere.d.d.e, 



NP^ . Min(lnt(k, x (S/DR^J), HP^f (Eq , 0) 

atror narrowed down, and , h a DR^ represents tde Ji,?*" 8 ™'"'''*" 1 *" 1 -* 
„ E£, ^PorTST^^ - — - - 

[Setting of Minimum/Typical/Maximum D igest-Segment Time Length! 

^CSisSiSsssis^ srs - has a ,ima ,an9m - - — * - - - 

naadiess inlonp.,lo„ ana „ ; ™ "™ r ™ n °' """acassarii, long time lengtb coutd contain a lot of 

une^THertforeW 

t-me length (DR Ty ). and the maximum digest-segment time lenrfh (DF I IT! Min) ' * P ^gest-segment 
[0217] ^exa^intheembodiment^ 

t.me length (DR Typ ), and the maximum digest-segment time ImXdb wff " n) J L ^' dl 9 est - se 9 men * 
so that the contents of each digest segment to o'e SXSS^r * *' ,0teW,n9 
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[0218]* Considering that the digest segment is made easily visible to the user, the minimum digest-segment time 
length (DR Mln ) is set as shown in equation 11 so that the digest segment. will have a relatively iong time length. The 
typical digest-segment time length (DR^) and the maximum digest-segment time length (DR Max ) are calculated by 
multiplying the minimum digest-segment time length (DR Min ) calculated from the equation 11 by a constant as shown 
5 in equations 12 and 13. 



DR Min = Max(DR LMjp; (K2 x (S/NP new ))) (Eq . 11) 

DR Typ = DR Mjn xK T1 (Eq.12) 



DR Max = DF W x K T2 (Eq- 13) 

[0219] Here ; K T1 and K T2 are proportional constants, and Max(a, b) means that the larger value out of a and b is 
selected. Further, K 2 (> 1) is a coefficient for use in deciding the minimum time of each digest segment. The larger the 
value of K 2 , the longer the minimum time and the smaller the number of digest segments. For example. K 2 = 1, K T1 = 
2, and K T2 = 3 in the embodiment. 

[Merging of Digest Segments] 

[0220] In the embodiment, when two or more digest segments coincide with each other, the digest segments are 
merged into a digest segment. In this case, the importance of the digest segment generated by merging two or more 
digest segments takes the highest value of importance (IPSSp from among values for all the digest segments (see the 
following equation 14). 



IPSSj-MaxflPSSj.lPSS^) (Eq. 14) 

30 

Further, if STSSj < STSS j4n and SESSj > SESS j+n for two digest segments SSj and SS j+rv the following equation is 
obtained: 



SESS^SESS^ (Eq.15) 

[0221] Thus, even when a digest segment is of little importance, if the digest segment coincides with another digest 
segment of much importance, the digest segment of little importance can be complemented by that of much importance. 

40 

[Decision of Digest Segment! 

[0222] In the embodiment, the digest segment candidates are selected in descending order of importance to achieve 
the specified digest time in the final process. 
45 [0223] The selection of digest segment candidates is continued until the total time of the selected digest segment 
candidates exceeds the specified digest time. 

[0224] When the digest segments are decided in descending order of importance, since the time length varies from 
segment to segment, the total time of the selected digest segments may exceed the specified digest time. If exceeding 
the specified digest time becomes a problem, necessary measures will be taken against the overtime, such as to share 
5Q the overtime among the decided digest segments and then eliminate the shared time from the stop time of each digest 
segment. 

[0225] The following describes the summary reproducing operation of the embodiment by using FIG. 6. 
[0226] Referring to FIG. 6, there is shown a flowchart of the summary reproducing operation of this embodiment. 
Assuming that the audio/video information required for summary reproduction is already stored in the storage unit 104, 
55 the operation is carried out when the user instructs the summary reproduction. *' 
[0227] First, when the user enters an instruction for summary reproduction by using the operation unit 1 05, the audio 
feature amount extraction unit 102 obtains audio feature amounts, that is, a plurality of audio sections on the basis of 
the preset parameters after an input of the audio information in the audio/video information through the demultiplexer 
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101 to the storage trait 104 (step sit) end the gettte i„ ro rarat„h obtaining ttttit 105 obtains g.nra intotraalton ttom 
,« o, the .gee, segraants b e eLate^ 

unit 1 06 (Sn pITiC 4)) ° SeCt '° nS ^ ^ ^^^"^ SSt by the dedsi ° n -«ing 

[0230] Finally, when the digest segments to be extracted is decided in step 14 the control unit 10B control, »ho 
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[0239] 



(II) Modification 



K! TH* ,0 II° Win9 d6SCribes a m °dification according to the present invention. 
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video information. 

[0242] Specifically, first, the above digest reproduction can be performed by detecting scene changes in the video 
information, and then repeating the reproduction only for a fixed period of time (for example. 10 sec.) with the timing 
of each of the detected scene changes as a start time. 

[0243] It should be noted that, however, the scene change detection can be weighted (giving a difference of the 
importance) to adjust the entire time period necessary for the summary reproduction. Preferably, the importance is 
decided using a time interval between the scene change and the previous one so as to decide the scene changes to 
be used for the summary reproduction in descending order of importance. Furthermore, it is also possible to have an 
arrangement capable of the optimization of the weighing function by using the genre information. 
[0244] The following describes the above operation more specifically by giving two examples. 

[For news program] 



[0245] First, the operation will be described by giving an example for a news program. 

« [0246] For wide and shallow browsing through the contents of a news program (in other words, reproducing a sum- 
mary), it is preferable to select a part following a long scene change interval so as to select as many as news contents 
forthe reproduction. On the other hand, parts following frequent scene changes have almost the same contents. There- 
fore, determining that a part accompanied by a long scene change interval is of much importance and that a part 
accompanied by a short scene change interval is of little importance, the importance is preferably decided by using 

20 the following arithmetic expression, for example: " 



f (x) = a x x + b 

25 [For sport program) 

[0247] Next, the operation will be described by giving a sport program. 

[0248] For example, scenes of little importance in the summary reproduction such as a pitching scene in a baseball 
game broadcasting program or a pass scene in a soccer game broadcasting program are often accompanied by long 

30 scene change intervals. On the other hand, scenes of much importance in the summary reproduction such as a hit 
scene m a baseball game broadcasting program or a goal scene in a soccer game broadcasting program are often 
accompanied by frequent scene changes such as review reproduction of individual scenes or a zoom up of a target 
player. Therefore, determining that a part accompanied by a long scene change interval is of little importance and that 
a part accompanied by a short scene change interval is of much importance, the importance is preferably decided by 

35 using the following arithmetic expression, for example: 

f(x) = (a/x) + b 

[0249] While the summary reproducing apparatus 1 00 is provided with the genre information obtaining unit 1 03 the 
decision parameter unit 1 06, the reproduction unit 1 07, and the control unit 1 08 in the embodiment as mentioned above 
it is also possible to provide the control unit 108 with a record medium such as a computer and a hard disk to store 
the program for executing the processes corresponding to the components of the summary reproducing apparatus 
1 00 such as the genre information obtaining unit 1 03, the decision parameter unit 1 06. the reproduction unit 1 07 and 
the control unit 108, and to make the computer read the program so as to perform the operations of the components 
of the summary reproducing apparatus 100 such as the genre information obtaining unit 103. the decision parameter 
unit 106, the reproduction unit 107, and the control unit 108. 

[0250] In this case, the above summary reproduction is performed by getting the computer to work with the stored 
program. Furthermore, in this case, the control unit 108 is provided with an obtaining device, an optimization device 
a setting device, a generation device, and a decision device. 
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Claims 



1 . A video information summarizing apparatus (1 00) for extracting one or more pieces of partial video information as 
some parts of video information from content information made of audio information and the video information and 
for generating digest information having a shorter time length of the video information on the basis of the partial 
video information extracted, characterized in that the apparatus comprises: 
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an obtaining device (1 03) which obtains identification information for identifying a type of the content informa- 
tion' 



75 



20 



a decision device (108) which classifies the content information into a plurality of content sections by using 
optimized thresholds and which decides the partial video information to be extracted on the basis of the clas- 
5 sified content section; 

an optimization device (106) which sets optimum values to one or a plurality of the thresholds used for clas- 
sifying the content intormation into the plurality of content sections on the basis of the obtained identification 
information; and 

a generation device (1 07, 1 08) which generates the digest information by extracting the decided partial video 
70 information from the video information. 

2. A video information summarizing apparatus (1 00) for extracting one or more pieces of partial video information as 
some parts of video information from the video information to which audio infoimation is added on the basis of the 
audio information and for generating digest information having a shorter time length of the video information on 
the basis of the partial video information extracted, characterized in that the apparatus comprises: 

an obtaining device (103) which obtains identification information for identifying a type of video information 
externally; 

a decision device (108) which classifies the audio information added to the video information into a plurality 
of audio sections by using optimized thresholds and which decides the partial video information to be extracted 
on the basis of the classified audio sections; 

an optimization device (106) which sets optimum values to one or a plurality of thresholds used for classifying 
the audio information into the plurality of audio sections on the basis of the obtained identification information- 
and 

a generation device (1 07, 108) which generates the digest information by extracting the decided partial video 
information from the video information. 

3. The apparatus according to claim 2, wherein 
the decision device (1 08) decides the partial video information to be extracted on the basis of at least a time- 
base position of at least any one of the plural types of classified audio sections. 

4. The apparatus according to claim 2 or 3, wherein, 
if the identification information shows an identification of video information having silent parts 
the decision device (1 08) obtains the silent sections at least partially having the silent parts by classifying 

the audio information and decides the partial video information at least on the basis of the silent sections and 

the opt.mizat.on device (106) optimizes the thresholds used when the decision device obtains the silent 
sections. 

5. The apparatus according to claim 2 or 3, wherein 
if the obtained identification information shows an identification of video information having cheer sounds in 

background noise forming the audio information, 

the decision device (1 08) obtains the cheer sound sections having the cheer sounds by classifying the audio 
information and decides the partial video information at least on the basis of the cheer sound sections and 

the optimization device (106) optimizes the thresholds used when the decision device obtains the cheer 
45 sound sections. 

6. A video information summarizing apparatus (1 00) for extracting one or more pieces of partial video information as 
some parts of video information from content information made of audio information and the video information and 
for generating d.gest information having a shorter time length of the video information on the basis of the partial 
video information extracted and importance, characterized in that the apparatus comprises: 

an obtaining device (1 03) which obtains identification information for identifying a type of the content informa- 
tion; 

a decision device (1 08) which classifies the content information into a plurality of content sections on the basis 
of thresholds in the content information and which decides the partial video information to be extracted on the 
basis of the classified content sections; 

an optimization device (106) which optimizes the importance set to each of the partial video information on 
the basis of the obtained identification information; 
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a setting device (108) which sets the optimized importance to each of the partial video information: and 
a generation device (107, 108) which generates the digest information by extracting the decided partial video 
information from the video information on the basis of the importance. 



7. 



A video information summarizing apparatus (1 00) for extracting one or more pieces of partial video information as 
some parts of video information from the video information to which audio information is added on the basis of the 
audio information and for generating digest information having a shorter time length of the video information on 
the basis of the partial video information extracted and importance, characterized in that the apparatus comprises: 

10 an obtaining device (1 03) which obtains identification information for identifying a type of the video information: 

a decision device (1 08) which classifies the video information into a plurality of audio sections on the basis of 
thresholds in the audio information and which decides the partial video information to be extracted on the basis 
of the classified sections; 

an optimization device (106) which optimizes the importance set to each of the partial video information on 
15 the basis of the obtained identification information; 

a setting device (108) which sets the optimized importance to each of the partial video information; and 
a generation device (107, 108) which generates the digest information by extracting the decided partial video 
information from the video information on the basis of the importance. 

20 8. The apparatus according to claim 7, wherein 

if the decision device (108) decides the partial video information to be extracted on the basis of different 
thresholds, 

the optimization device (106) optimizes the importance for each of the different thresholds on the basis of 
the obtained identification information, and 

the setting device (1 08) sets the optimized importance to the partial video information. 



9. The apparatus according to claim 7 or 8, wherein 
if the obtained identification information shows an identification of video information having silent parts, 
the decision device (1 08) obtains silent sections at least partially having silent parts by classifying the audio 

information and decides the partial video information at least on the basis of the silent sections, and 

the optimization device (106) optimizes the importance set to the partial video information decided based on 
the silent sections. 

10. The apparatus according to claim 7 or 8, wherein 
if the obtained identification information shows an identification of video information having cheer sounds in 

background noise forming the audio information, 

the decision device (108) obtains cheer sound sections having the cheer sounds by classifying the audio 
information and decides the partial video information at least on the basis of the cheer sound sections, and 

the optimization device (106) optimizes the importance set to the partial video information set based on the 
40 cheer sound sections. 

11. A video information summarizing method for extracting one or more pieces of partial video information as some 
parts of video information from content information made of audio information and the video information and for 
generating digest information having a shorter time length of the video information on the basis of the partial video 

45 information extracted, characterized in that the method comprises: 

an obtaining process for obtaining identification information for identifying a type of the content information; 
a decision process for classifying the content information into a plurality of content sections by using optimized 
thresholds and for deciding the partial video information to be extracted on the basis of the classified content 
50 sections; 

an optimization process for setting optimum values to the one or more thresholds used for classifying the 
content information into the plurality of content sections on the basis of the obtained identification information- 
and 

a generation process for extracting the decided partial video information from the video information to generate 
55 the digest information. 



12 



A video information summarizing method for extracting one or more pieces of partial video information as some 
parts of video information from the video information to which audio information is added on the basis of the audio 
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or the partial video information extracted, characterized in that the method comprises: 

an obtaining process for obtaining identification information for identifying a type of the video information 

s a ec tio SIOn h Pr ° CeSS ' " ClaSSifyin9 the aUdi ° in,0rmation added t0 the ^ infoTmationtnC u aSo audio 

bS nf th" 7 n9 ,° P r ,Z . ed threSh ° ,dS and f ° r d6Cidin 9 ,he P artial vide ° ation to be extrac ed on the 

basis of the classified audio sections; 

an optimization process for setting optimum values to the one or more thresholds used for classifyina the audio 
information into the pluraiity of audio sections on the basis of the obtained identification S2Z and 

13 ' I^traTtedon SbSof attTJfn ^ dedSl ° n Pr ° CeS£ " t0 dSCide the ^ video "«on to be 

sfctbns 386 P ° Siti0n ° f at ' eaSt any ° ne 01 the P' ural *yP es of cla ^^ed audio 

14. The method according to claim 12 or 13, wherein 
!hl h L°ri? ined idemiflCati0n informa tion ^ows an identification of video information having siient parts 

intermit 7 Tf 8 " * ° btain the Si ' ent Sections at least P artia "y ha ™9 the silent parts when the audio 

the opt,m.zat,on process is to optimize the thresholds used for obtaining the silent sections. 

15. The method according to claim 13 or 14 wherein 

*ac „r.t £r an ae "" ,,oa,io " of via *° "" oma,ton — — ■ * 

the agistor, process is to odtain the cheer sound sections h.eing the ctr.sr sounds when the audio Mo, 
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the 0pt.m12at.0n process is to optimize the thresholds used for obtaining the cheer sound sections. 

16. A video information summarizing method for extracting one or more pieces of partial video information as some 
r 1 V * 0 ,n,0, ™ t,on from content ^--tion ™de of audio information and the video IfZaL and™ 

generating digest information having a shortertime length of the video information on the basis o Te oartial video 
infoimabon extracted and importance, characterized in that the method comprises P 

an obtaining process for obtaining identification information for identifying a type of the content information- 
thtTr P T SS C ' aSSifyin9 thS C ° ntent inf0rmati0n int0 a °f content sections on theTS of 
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l n!n'ri Pr ° CeSS f ° r T'" 9 thS ° PtimiZed im P° rtance to each of the partial video information; and 

ISL Pr ° CeSS 6XtraCtin9 the d£Cided Partlal Video 'Nation from the video information on the 
basis of the importance to generate the digest information. '".ormanon on tne 

17 ' t V lT°< in ' ormation summarizing method for extracting one or more pieces of partial video information as some 
parts of video in formation from the video information to which audio information is added on the bTsS of the Tu7o 
of ^he^D rt^ generating digest information having a shortertime iength of the video information^ on ttie ^as s 

of the partial v.deo .nformation extracted, characterized in that the method comprises: 

an obtaining process for obtaining identification information for identifying a type of the video information 
Is' TZ P T SS T C ' aSS " ying VidS ° inf0rmati ° n int ° a P ' urali * ° f a'dio sections o^S^EL,. 
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a generation process for extracting the decided partial video information from the video information on the 
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basis of the importance to generate the digest information. 

18. The method according to claim 1 7, wherein. 

if the decision process is to decide the partial video information to be extracted on the basis of the different 
5 thresholds, 

the optimization process is to optimize the importance for each of the different thresholds on the basis of the 
obtained identification information; and 

the setting process is to set the optimized importance to the partial video information. 

10 19. The method according to claim 1 7 or 1 8, wherein, 

if the obtained identification information shows an identification of video information having silent parts, 
the decision process is to obtain silent sections at least partially having silent parts when the audio information 
is classified into the plurality of audio sections and to decide the partial video information on the basis of the silent 
section concerned, and 

the optimization process is to optimize the importance set to the partial video information decided based on 
the silent sections. 
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20. The method according to claim 1 7 or 1 8, wherein 

if the obtained identification information shows an identification of video information having cheer sounds in 
background noise forming the audio information, 

the decision process is to obtain cheer sound sections having cheer sounds when the audio information is 
classified into the plurality of- the audio sections and to decide the partial video information on the basis of the 
cheer sound section concerned, and 

the optimization process is to optimize the importance set to the partial video information decided based on 
25 the cheer sound sections. 
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